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TEXT-TO-WAVBFORM COSVERSION 



This invention relates t a method and apparatus 
for c nv rting text to a waveform* More sp cifieally, 
it relates to the production of an output in form of an 
acoustic wave, namely synthetic speech, from an input in 
5 the form of signals representing a conventional text. 

This overall conversion is very complicated and 
it is sometimes carried out in several modules wherein 
the output of one module constitutes the input for the 
nest. The first module receives signals representing a 

10 conventional text and the final module produces 
synthetic speech as its output. This synthetic speech 
may be a digital representation of the waveform followed 
by conventional digital-to-analogue conversion in order 
to produce the audible output. In many cases it is 

15 desired to provide the audible output over a telephone 
system. In this case it may be convenient to carry out 
the digital-to-analogue conversion after transmission so 
that transmission takes place in digital form* 

There are advantages in the modular structure, 

20 e. g, each module is separately designed and any one of 
the modules can be replaced or altered in order to 
provide flexibility, improvements or to cope with 
changing circumstances. 

Some procedures utilise a sequence of three 

?5 modules, namely 

(A) pre-editing, 

(B) conversion of graphemes to phonemes, and 

(C) conversion of phonemes to (digital) 
waveform. 

30 a brief description of these modules will now be 

given. 

Module (A) receives signals representing a 
conventional text, e. g. the text of this specification, 
and it modifies selected features. Thus module (A) may 
35 specify how numb rs are processed. For example, it 
will decid if 

u 1345" 
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becomes 

One three four five 
Thirteen forty-five r 

One thousand three hundred and forty- five. 
5 It will be apparent that it is relatively easy to 
provide different forms of module (A), each of which is 
compatible with the subsequent modules so that different 
forms of output result. 

Module (B) converts graphemes to phonemes. 

10 "Grapheme 11 denotes data representations corresponding to 
the symbols of the conventional alaphbet used in the 
conventional manner. The text of this specification is 
a good example of "graphemes". It is a problem of 
synthetic speech that the graphemes may have little 

15 relationship to the way in which the words are 
pronounced, especially in languages such as English. 
Therefore, in order to produce waveforms , it is 
appropriate to convert the graphemes into a different 
alphabet, called "phonemes 11 in this specification, 

20 which has a very close correlation with the sound of the 
words. In other words it is the purpose of module (B) 
to deal with the problem that the conventional alphabet 
is not phonetic. 

Module (C) converts the phonemes into a digital 

25 waveform which, as mentioned above, can be converted 
into an analogue format and thence into audible 
waveform. 

This invention relates to a method and apparatus 
for use in module (B) and this module will now be 

30 describe in more detail. 

Module (B) utilises linked databases which are 
formed of a large number of independent entries. Bach 
entry include access data which is in the form of 
representations! eg bytes, of a sequence of graphemes 

35 and an output string which contains representations, eg 
bytes f the phoneme equivalent to the graph m s 
contained in the access secti n. A major problem of 
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grapheme/ph neme e aversion resides in the size f 
database necessary to cope with a language. One simple, 
and theoretically ideal, solution would be to provide a 
database so large that it has an individual entry for 
5 every possible word in the language, including all 
possible inflections of every possible word in the 
language. Clearly, given a complete database, every 
word in the input text would be individually recognised 
and an excellent phoneme equivalent would be output. It 

10 should be apparent that it is not possible to provide 
such a complete database. In the first place* it is not 
possible to list every word in a language and even if 
such a list were available it would be too large for 
computational purposes. 

15 Although the complete database is not possible, 

it is possible to provide a database of useable 
dimension which contains s for example, common words and 
words whose pronunciation is not simply related to the 
spelling. Such a database will give excellent 

20 grapheme/phoneme conversion for the words included 
therein but it will fail, i.e. give no output at all, 
for the missing words. In any practical implementation 
this would mean an unacceptably high proportion of 
failure. 

25 Another possibility uses a database in which the 

access data corresponds to short strings of graphemes 
each of which is linked to its equivalent string of 
phonemes. This alternative utilises a 
manageable size of database but it depends upon analysis 

30 of the input text to match strings contained therein 
with the access data in the database. Systems of this 
nature can provide a high proportion of excellent 
pronunciations with occurrences of slight and severe 
mispronunciation. There will also be a proportion of 

35 failures wherein no output at all is produ ed either 
becaus the analysis fails or a ne ded string of 
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graphemes is missing fr m the access section of the 
database. 

A final possibility is conveniently known as a 
"default 11 proceedure because it is only used when 
5 preferred techniques fail. A "default" proceedure 
conveniently takes the form of "pronouncing 11 the symbols 
of the input text. Since the range of input symbols is 
not only known but limited (usually less than 100 and in 
many cases less than 50) it is not only possible to 

10 produce the database but its size is very small in 
relation to the capacity of modern data storage systems. 

This default proceedure therefore guarantees an output 
even though that output may not be the most appropriate 
solution. Examples of this include names in which 

15 initials are used, degrees and honours; and some 
abbreviations for units. It will be appreciated that, 
in these circumstances , it is usual to ° pronounce 0 out 
the letters and on these occasions the default 
proceedure provides the best results. 

20 Three different strategies for converting 

graphemes to phonemes have just been identified and it 
is important to realise that these alternatives are not 
mutually exclusive. In fact it is desirable to use all 
three alternatives according to a strict order of 

25 precedence. Thus the "whole word" database is used 
first and, if it gives an output, that output will be 
excellent. When it fails ■ the analysis" technique is 
used which may involve a small but acceptable number of 
mis -pronunciations. Finally if the "analysis" fails 

30 the default option of pronouncing the "letters" is 
utilised ana this can be guaranteed to give an output. 
Although this may not be completely satisfactory, it 
will, in a proportion of cases as explained above, give 
the most appropriate result. 

35 This invention relates to the middle option in 

the sequence outlined above. That is t say this 
invention is c nc raed with the analysis of the data 
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representations corresponding to input text graphemes in 
rder to produc an output set of dat repr sentations 
b ing the phonemes corresp nding to the input t xt. It 
is emphasised that the working environment of this 
5 invention is the complete text-to-waveform conversion as 
described in greater detail above. That is to say this 
invention relates to a particular component of the whole 
system. 

According to this invention an input sequence of 
10 bytes, eg. data representations representing a string of 
characters selected from a first character set such as 
graphemes, is dissected into sub-strings for conversion 
into an output sequence of bytes, eg data 
representations representing a string of characters 

15 selected from a second character set such as phonemes, 
wherein said method includes retrograde analysis wherein 
later occurring bytes are selected before earlier bytes 
whereby the selection of the earlier occurring bytes is 
at least partially determined by the previous selection 

20 of later occurring bytes. 

The method of the invention is particularly 
suitable for the processing of an input string divided 
into blocks, e. g. blocks corresponding to words, wherein 
a block is analysed into segments beginning from the end 

25 and working to the beginning wherein the choice of 
segment is taken from the end of the remaining 
unprocessed string. 

The invention, which is defined in the claims, 
includes the methods and apparatus for carrying out the 

30 methods. 

The data representations, eg bytes, utilised in 
the method according to this invention take any signal 
form which is suitable for use in computing circuitry. 
Thus the data representations may be signals in the form 
35 of electric current (amps), electric potential (volts), 
magn tic fi Ids, electric fields, or electro-magnetic 
radiation. In addition, the data representations may 
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be st red, Including transient storage as part f 
processing, in a suitable storage medium, e. g, as the 
degree of and/or the orientation of magnetisation in a 
magnetic medium. 

The theoretical basis and some preferred 
embodiments of the invention will now be described. In 
the preferred embodiments the input signals are divided 
into blocks which correspond to the individual words of 
the text and the invention works on each block 
separately; thus the process can be considered as "word- 
by-word" processing. 

It is now convenient to restate the requirement 
that it is not neeessary to produce an output for every 
one of the blocks because, as described above, the whole 
15 system includes further modules to deal with such 
failures. 

As a preliminary, it is convenient to illustrate 
the theoretical basis of the invention by considering 
the structure of words in the English language and by 
20 commenting on the structures of a few specific words. 

This analysis uses the distinction usually identified as 
"vowels" and ° consonants" , For mechanical processing it 
is necessary to store two lists of characters. One of 
these lists contains the characters specified as 
25 "vowels" and the other list contains those characters 
designated as ■ consonants - . All characters are, 
preferably, included in one or other of the lists but, 
in the preferred embodiment, the data representations 
corresponding to U Y° are included in both lists. This 
is because conventional English spelling sometimes 
utilises the letter B Y° as a vowel and sometimes as a 
consonant. Thus the first list (of vowels) contains a, 
e, i, o, u and y, whereas the second list of consonants 
contains b, c, d, f, g, h, j, k, 1, m, n, p, q, r ,s, t, 
35 v, w, x, y, z. Th fact that "T» appears in b th lists 
means that the condition °not vowel 0 is different from 
the c ndition "consonant 0 . 



30 
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The primary purpose of the analysis is to split 
a hi ck of data representations, ie. a word, into 
0 rimes" and "onsets 0 . It is important to realise that 
the analysis uses linked databases which contain the 
5 grapheme equivalents of rimes and onsets linked to their 
phoneme equivalents. The purpose of the analysis is not 
merely to split the data into arbitrary sequences 
representing rimes and onsets but into sequences which 
are contained in the database. 
10 A rime denotes a string of one or more 

characters each of which is contained in the list of 
vowels or such a string followed by a second string of 
characters not contained in the list of vowels. An 
alternative statement of this requirement is that a rime 
15 consists of a first string followed by a second string 
wherein all the characters contained in the first string 
are contained in the list of vowels and the first string 
must not be empty and the second string consists 
entirely of characters not found in the list of vowels 
20 which the proviso that the second string may be empty. 

An onset is a string of characters all of which 
are contained in the list of consonants. 

She analysis requires that the end of a word 
shall be a rime. It is permitted that the word contains 
25 adjacent rimes , but it is not permitted that it contains 
adjacent onsets. It has been specified that the end of 
the word must be a rime but it should be noted that the 
beginning of the word can be either a rime or a 
consonant; for instance "orange" begins with a rime 
30 whereas "pear 0 begins with an onset. 

In order to illustrate the underlying theory of 
the invention four specimen words, arbitrarily selected 
from the English language, will be displayed and 
analysed into their rimes and onsets. 

35 
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tETftg* SPECIMEN 

CATS 

rime 0 ats" 
onset "c n 

5 It is to be expected that "ats" will be listed 

as a rime and °c° will be listed as an onset. Therefore 
replacing each by its phoneme equivalent will convert 
"cats 0 into phonemes. 

It should be noted that the rime "ats" has a 
10 first string consisting of the single vowel "a" and a 
second string which consists of two non-vowels namely 
B t» and "8". 

fiECQNP SPECIMEN 

STREET 
15 rime "eet" 
onset 11 ats 0 . 

In this oase the first string of the rime contains two 
letters namely n ee" and the second string is a single 
non-vowel "t". The onset consists of a string of three 
20 consonants* 

The onset D str M and the rime Q eet n should both 
be contained in the database so that phoneme equivalents 
are provided. 

THIRD SPECIMEN 

25 HIGH 

rime °igh" 
onset tt h* 

In this example the rime "igh" is one of the 
arbitrary of sounds of the English language but the 
30 database can give a correct conversion to phonemes. 

FQTOTH SPECIMEN 

HZGHSTBEST 
second rime "eet" 
second onset •etr n 
35 first rime H igh» 
first onset "h". 
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Clearly the word 0 highs tr t* is a compound of 
the previous two examples ana its analysis is v ry 
similar to these two examples. However, there is an 
important extra requirement in that it is necessary to 
5 recognise that there is a break between the fourth and 
fifth letters in order to split the word into "high* and 
° street*. This split is recognised by virtue of the 
contents of the database. Thus the consonant string 
oghstr" is not an onset in the English language and, 

10 therefore, it will not be in the database so that it 
cannot be recognised. Furthermore the string "hstr 0 
will not be in the database. However, °str° is a common 
onset in English and it should be in the database. 
Therefore "str° can be recognised as an onset and "str" 

15 is the later part of the string "ghstr d . Once the end 
of the string has been recognised as an onset the 
earlier part is identified as part of the preceding rime 
and the word * high* can be split as described above. It 
is the purpose of this example to illustrate that the 

20 splitting of an internal string of consonants is 
sometimes important and that the split is achieved by 
the use of the database. 

We have now given a description of the theory 
which underlies the techniques of the invention and it 

25 is now appropriate to indicate how this is carried into 
effect using automatic computing equipment, which is 
illustrated in the accompanying diagrammatic drawing. 

The computing equipment operates on strings of 
signals, eg. electrical pulses* The smallest unit of 

30 computation is a string of signals corresponding to a 
single grapheme of the original text. For convenience 
such a string of signal will be designated as a °byte D 
no matter how many hits it contains in the "byte". 
Originally the term "byte 0 indicated a sequence of S 

35 bits. Since 0 bits provides count of 255 this is 
sufficient t accommodate most alphabets. However, the 
•byte 0 do s not necessarily contain B bits. 
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The processing described below is carried cut 
bl ck-by~block wherein ach block is a string of ne or 
more bytes: Each block corresponds to an individual 
word (or potential word, since it is possible that the 
5 data will contain blocks which are not translatable so 
that the conversion must fail). The purpose of the 
method is to convert an input block whose bytes 
represent graphemes into an output block whose bytes 
represent phonemes. The method works by dividing the 

10 input block into sub-strings, converting each sub-string 
in a look-up table and then concatenating to produce the 
output block. 

The operational mode of the computing equipment 
has two operation procedures. Thus it has a first 

15 procedure which includes two phases and the first 
procedure is utilised for identifying byte . strings 
corresponding to rimes. The second procedure has only 
one phase and it is used for identifying byte strings 
corresponding to onsets. 

20 As indicated in the drawing, the computing 

equipment comprises an input buffer 10 which holds 
blocks from previous processing until they are ready to 
be processed. The input buffer 10 is connected to a 
data store 11 and it provides individual blocks to the 

25 data store 11 on demand. 

An important part of the computing equipment is 
storage means 12. This contains programming 

Instructions and also the databases and lists which are 
needed to carry out the processing. As will be 

30 described in greater detail below, storage means 12 is 
divided into various functional areas. 

The data processing equipment also includes a 
working store 14 which is required to hold sub-sets of 
bytes acquired from data store 11, for processing and 

35 for comparison with byte strings held in databases 
contained in the storag 12. Single bytes, ie. signal 
strings corresponding to individual graph m s, are 
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transferred from the input buffer 10 to the working 
store 14 via check st re 13 which has capacity for ne 
byte. The byte in check st re 13 is check d against 
lists contained in data storage 12 before transfer to 
5 the working store 14* 

After successful matching with items contained 
in the working storage 12 strings are transferred from 
the working store 14 to the output store IS. For use 
when matching fails the equipment includes means to 
10 return a byte from the working store 14 to the data 
store 11. 

In addition to other areas , eg for program 
instructions, the storage means 12 has four major 
storage areas. These areas will now be identified. 

15 First the storage means has areas for two 

different lists of bytes. These are a first storage 
area 12. 1 which contains which contains a list of bytes 
corresponding to the vowels and a second storage area 
12. 2 which contains a list of bytes corresponding to the 

20 consonants. (She vowels and the consonants have been 
previously identified in this specification). 

The storage means 12 also contains two areas of 
storage which constitute two different, and substantial, 
linked databases. First there is the rime database 12. 3 

25 which is further divided into regions designated 12.31, 
12. 32, 12. 33, etc. Each region has an input section 
containing bytes strings corresponding to "rimes" in 
graphemes and, as shown in the drawing, this includes 
12.31 containing 'ATS", 12.32 containing D EET", 12.33 

30 containing "IGIT and many more sections not Illustrated 
in the drawing. 

The storage means 12 also contains a second 
major area 12.4, which contains byte strings equivalent 
to the onsets. As with the rimes, the onset database 

35 12. 4 is also divided into many regions. F r example, it 
comprises 12.41 containing a C n , 12.42 containing "STR" 
and 12.43 containing °H B . 
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Bach of the input section (of 12.3 and 12.4) is 
linked t an output section whi h contains a string f 
bytes corresponding t the c ntent f its input section. 

It has already been stated that the operational 
5 method includes two different procedures. The first 
procedure utilises storage areas 12. 1 and 12. 3 whereas 
the second procedure utilises storage areas 12.2 and 
12. 4. It is emphasised that the areas of the database 
which are actually used are defined entirely by the 
10 procedure in operation. The procedures are used 
alternately and procedure number 1 is used first. 

BPBQTPTe EXAMPLE 

it will be noted that this specific example 
15 relates to the word selected as the fourth specimen in 
the description given above. Therefore its rimes and 
onsets are already defined and the specific example 
explains how these are achieved by mechanical 
computation. 

20 The analysis begins when the input buffer 10 

transfers the byte string corresponding to the word 
"KXGHSTRBET" into the data store 12. Thus, at the start 
of the process, the important 6 tores have the contents 
as follows: - 
25 qpORE GQMTgHT 

11 HIGH5TREET 

13 

14 

15 

30 (The symbol 0 n indicates that the relevant store is 

empty). 

The analysis begins with the first procedure 
because the analysis always begins with the first 
procedure. As mentioned above, the first procedure uses 
35 storage regions 21. 1 and 12. 3. The first procedure has 
two phases during which bytes are transferred from the 
data store 11 t the working st re 14 via the check 



flUG-31-2001 11 = 47 PATENT PROUIDERS INC 7034152520 P. 16/28 

WO 94/23423 PCT/GB94/0OCO 

- 13 - 

store 13, The first phas continues for so long as the 
bytes are not found in storage region 12. 1. 

The procedure is a retrograde which means that 
it works from the back of the word and therefore the 
5 first transfer is U T° which is not contained in region 
12. 1* The second transfer is B E° which is contained in 
the region 12. 1 and therefore the second phase of the 
first procedure is initiated. This continues for as 
long as the byte in working store 14 is matched in 12. 1 
10 therefore the second n B° is transferred but the check 
fails when the next byte w R n is passed. At this stage 
the state of the various stores is as follows. 

ATOM COfflPgKT 

11 H2GHST 

15 13 R 

14 EST 

15 

The contents of the working store 14 are used to 

access storage area 12. 3 and a match is found in region 
20 12. 32. Thus the match has succeeded and the content of 

the working store 14, namely "BET* is transferred to a 

region of the output store 15 so that the state of the 

various stores is as follows. 

STORE CONTEST 
25 11 HIGRST 

13 * 

14 

15 BET 

It will be noticed that the first rime has been found 

30 mechanically. 

As mentioned above, the non-matching of B R° in 
the check store 13 terminated the first performance of 
the first procedure. The analysis continues but the 
second procedure is now used because the two procedures 

35 always alternate. The second procedure utilis s the 
storage regi ns 12. 2 and 12. 4. The byt correspo n ding 
t "R tt in check stor 13 now matches because region 12. 2 
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is now in use ana this byte is contain d therein. 
Theref re "R" is transferred to th working store 14 and 
the second procedure continues so long as the hyte in 
check store 13 matches. Thus the letters "T", "S tt , °H» 
5 and "G" are all transferred via the check store 13. At 
this point the byte corresponding to •I" arrives in the 
check store 13 and the check fails because the byte 
corresponding to °r is not contained in storage region 
12.2. Since the check fails this performance of the 
10 second procedure terminates. The contents of the 
various stores are: - 
fiTQftg CONTENT 
11 °H D 

13 . "1° 

15 14 °GHSTR D 

15 "BET" 

The second procedure will attempt to match the 
content of the working store 14 with the database 
contained in 12. 4 but no match will be achieved. 

20 Therefore the second procedure continues with its 
remedial part wherein the bytes are transferred back to 
the data store 11 via the check store 13. At each 
transfer it is attempted to locate the content of the 
working store 14 in storage area 12. 4. A match will be 

25 achieved when the letters G and H have been returned 
because the string equivalent to "STEP is contained in 
region 12. 42. Having achieved a match the content of 
the working store is put out into a region of the output 
store 15. At this point the content of the various 

30 stores is as follows: - 

11 "HrG" 

13 "H" 
14 

35 15 "STR" and "EET" 

Th sec nd procedure was terminat d by finding the match 
so the analysis now goes back t the first procedure and 
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m re particularly t th first phase f the first 
procedure. In this way the letters »H» ana »G" are 
transferred to the working store 14 and the first phase 
ends. The second phase passes °I a and it terminates 
S when °H" is transferred to the check store 13. At this 
stage the various stores have contents as follows: - 
store CQHtfflHT 
11 

13 «H" 

10 14 'IGR- 

15 "STR" and "BET". 

The first procedure now attempts to match the content of 
the working store 14 with the database in the storage 
area 12. 3 and a natch is found in region 12, 33. 

15 Therefore the content of the working store 14 is 
transferred to a region of the output store 15. 

The analysis now continues with the second 
procedure and the letter "H" (in the check store 13) is 
located in storage region 12. 2 (note that this region is 

20 now in use because the analysis has now gone back to the 
second procedure). The analysis can now terminate 
because the. data store 11 has no further bytes to 
transfer and the content of the working store, namely > 
•H", is found in region 12.43 of the storage means 12. 

25 Thus °H n is transferred to the output store 15, which 
contains the correct four strings found by mechanical 
analysis. 

The necessary output strings having been 
located, it is only necessary to convert them using the 

30 fact that storage areas 12. 3 and 12. 4 are linked 
databases. Each region not only has the strings now 
contained in the output store, but each region has 
linked output regions containing strings corresponding 
to the appropriate phonemes. Therefore each string in 

35 the output stor is used t access its appropriate 
r gion and hence produce the necessary output. The 
final step merely utilis s a look-up table and this is 
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p ssible because the important analysis has been 
completed. 

As indicated above, the identified strings serve 
as access to the linked database and, in a simple 
5 8ystem i there is one output string for each access 
string. However, pronunciation sometimes depends on 
context and improved conversion can be achieved by 
providing a plurality of outputs for at least some of 
the access strings. Selecting the appropriate output 

10 stream depends upon analysing the context of the access 
stream, eg. to take into account the position in the 
word or what follows or what proceeds. This further 
complication does not affect the invention, which is 
solely concerned with the division into appropriate 

15 sections. It merely complicates the look-up process. 

As was explained above, the invention is not 
necessarily required to produce an output because, in 
the case of failure, the complete system contains a 
default technique, eg. providing a phoneme equivalent 

20 for each grapheme. In order to complete the description 
of the technique, it is considered desirable to provide 
a brief indication of the circumstance in which this 
failure occurs and use of a default technique is 
required. 

25 rnllnrw Mods U 

The first failure mode will . occur when the 
content of the data store does not contain a vowel which 
implies that it Is not a word. As always* the analysis 
starts by using the first procedure and, more 

30 specifically, the first phase of the first procedure and 
this will continue so long as there is no match with the 
first list 12. 1. Since the string and data store 11 
contains no match, the first phase will continue until 
the beginning of the word and this indicates that there 

35 is a failure. 
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This failure ccurs when: - 

(i) the second procedure is in use; 

(ii) the beginning of the word is reached and; 

5 (iii) there is no match for the content of the working 
store 14 in the database 12.4. 
•Phis contrasts with failure to match during the 
middle of the word which implies that a vowel is 
contained ia the check store 13. Failure at this stage 
10 permits the returning of bytes for later analysis by the 
first procedure and there is no failure, at least not at 
this point in the analysis. When the beginning of the 
word is reached, there is no possibility of further 
analysis and hence the analysis has to fail. 

15 aairfl FnUura Moflo 

The third failure mode occurs when the first 
procedure is in use and it is not possible to match the 
contents of the working store 14 with a string contained 
in the database 12.3. Under these circumstances the 

20 first procedure will transfer bytes back to the check 
store 13 and the data store 11 and this transfer can 
continue until working store 14 becomes empty and the 
analysis also fails. 

in the second failure mode, it was explained 

25 that the second procedure is allowed to return bytes to 
input for later analysis by the second procedure. 
However, the transferred bytes must be matched at some 
time and this means during the next performance of the 
first procedure. The third failure mode corresponds to 

30 the case where it is not possible to achieve the later 
match. 

Thus the method of the invention provides 
analysis of a date string into segments which can be 
converted using look-up tables. It is not necessary 
95 that the analysis shall eucc d in every eas but, giv n 
good databases, the m th d will work v ry frequ ntly and 
nhane the performance of a complete system which 
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comprises the tfa r nodules necessary for text to speech 
conversion. 
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GIA1MS. 

l ^ a method of processing an input signal 

representing a string of characters selected from a 
5 first character set so as to identify sub-strings for 
conversion into an output signal representing a string 
of characters selected from a second cbararrcer set, 
wherein said method divides said input signal into sub- 
strings by retrograde analysis, said retrograde analysis 

10 comprising the selection of later occurring portions of 
the input signal before earlier occurring portions 
thereof wherein the prior selection of a later portion 
at least partially defines the selection of an earlier 
occurring portion; said later occurring portions being 

15 contained in one of said sub-strings and said earlier 
occurring portion being contained in a different one of 
said sub-strings. 

2. A method according to claim 1, wherein said 
20 input signal is composed of a string of bytes each of 

said bytes corresponding to a character of the first 
character set. 

3, A method according to either claim 1 or claim 2, 
25 wherein the method is preformed in conjunction with 

signal storage means which includes first, second, third 
and fourth storage areas wherein: - 

(i) the first storage area contains a 
plurality of bytes each of which 

30 represents a character selected from the 

first character set; 

(ii) the second storage area contains a 
plurality of bytes each of which 
represents a character selected from the 

35 first character set, tb total c ntent of 

said s cond st rage area being different 



• 
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from the total content of said first 
storage ar a; 

(iii) the third storage area contains strings 
consisting of one or more bytes 

5 representing characters of the first 

character set wherein the or the first 
]?yte of each string is contained in .the 
fine storage area; and 

(iv) the fourth storage area contains strings 
l0 of one or more bytes the or each of which 

is contained in the second storage area. . 

4. h method according to claim 3, wherein the input 

signal ia divided into blocks and processing of at least 
15 some of said blocks comprises: - 

(a) identifying an internal string of 
consecutive bytes each of which is 
contained in the second storage area said 
string being immediately ' proceeded by a 

20 predecessor byte contained in the first 

storage area and immediately followed by 
a' successor byte contained in the first 
storage area; 

(b) identifying the longest end string of said 
25 internal string with strings contained in 

the fourth storage area; 
(o) defining an initial portion of said 
internal string being the residue of said 
internal string after the separation of 

30 the end string defined in (b) and 

combining said initial string with the 
predecessor bit specified in (a) and 
identifying a string including said 
predecessor bit and said initial portion 

35 with a string stored is said a cond 

storage area. 
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5. A method of converting an input signal 
repres nting a string f characters selected fr m th 
first character set into an equivalent signal 
representing a sting of characters selected from the 

5 second character set; which method comprises identifying 
sub-strings by a method according to any one of the 
preceding claims and converting sub-strings by a linked 
database which has input sections each of which contains 
one of said sub-strings each input section being linked 
10 to an output section whieh contains the output 
equivalent of the content of the input section. 

6. A method according to olaim 5, wherein the input 
signal is divided into input blocks and wherein each 

15 block is separately converted wherein at least some of 
said blocks axe converted as a whole without sub- 
division and at least some of the said blocks are 
converted by a method according to claim 5. 
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